\section{MSR4J Architecture}
\label{sec:archi}
The architecture of MSR4J is structured in 2 top-level layers, as depicted
 in \figurename\ref{fig:archi1}: the \emph{infrastructure} and the \emph{superstructure}.

\begin{figure}[!t]
\centering
\includegraphics[scale=0.75]{generalArchitecture}
\caption{Overview of MSR4J Architecture}
\label{fig:archi1}
\end{figure}

The infrastructure is composed of a single layer where we use the 
Tinkerpop framework\footnote{\url{http://www.tinkerpop.com/}}.
Tinkerpop is a powerful stack of Open Source software products
for graph design, manipulation, traversal and query, dataflow
transformation and graph analysis algorithms.
In this set, MSR4J core data model primarily relies on Frames, 
that \emph{"exposes elements of Blueprints as graph as Java objects."}.
The intent of Frames is to enable designers to write software 
\emph{"in terms of domain objects and their relationships to each other, 
instead of writing software in terms of vertices and edges"}.

Blueprints is a property graph model with different databases implementations.
According to the authors, Blueprints is analogous to JDBC, but for graph 
databases. The wide range of database implementations is an important feature,
to avoid vendor lock-in. So, there are connectors to Neo4J, OrientDB, Accumulo,
InfiniteGraph, MongoDB, Oracle NoSQL, Titan, etc.
The property graph model offered in Blueprints is a standard one, with 
vertices having a unique identifier and a set of properties (map of keys/values),
and edges having a unique identifier, a label that denotes the type of relationship
between two vertices, and a set of properties.

The \emph{Core Data Model} of MSR4J is part of the lower layer of the superstructure.
Using Frames, it provides the node types as annotated Java interfaces to the 
application space, which is part of the upper layer of the superstructure.
Methods signatures in these interfaces are getters/setters,
annotated by properties and vertex adjacency (i.e. the relationships) specifications.
Methods related to properties that can be computed on demand, e.g. number of commits 
or total number of lines added by a committer, are annotated by Gremlin Groovy queries.
Gremlin is a graph traversal language of the Tinkerpop framework, providing a syntax
for querying the graph data structure. It is leveraged by Frames to allow complex
computations of more sophisticated traversal queries than the basic ones specifying
adjacencies.

The code snippet in listing~\ref{lst:cn} shows an excerpt of the definition
of the Committer node. There are the declarations of the commiter's id property,
the adjacency to the Commit node, and an example of a Gremlin query in Groovy, 
about the total number of lines added by a committer. The \textit{VertexFrame}
interface the \textit{Committer} interface is extending belongs to the Frames
framework.

\begin{lstlisting}[caption={Excerpt of the Committer node interface},label=lst:cn]
public interface Committer extends VertexFrame {

	@Property("id")
	void setId(String id);

	@Property("id")
	String getId();
	
	// skipped some code...
	
	// Adjacency to Commit
	@Adjacency(label = PropagateRelation.PROPAGATE, 
		direction = Direction.OUT)
	Iterable<Commit> getCommits();

	@Adjacency(label = PropagateRelation.PROPAGATE,
		 direction = Direction.OUT)
	void addCommit(Commit c);

	@Adjacency(label = PropagateRelation.PROPAGATE,
		 direction = Direction.OUT)
	void addCommits(Iterable<Commit> commits);
	
	// skipped some code...
	
	// Total number of commits by this committer.
	@GremlinGroovy(value = "it.outE('label', '" 
	 + PropagateRelation.PROPAGATE 
	 + "').inV.gather{it.size()}", frame = false)
	Integer getNumberOfCommits();
\end{lstlisting}

\begin{figure}[!t]
\centering
\includegraphics[scale=0.7]{datamodelPackage}
\caption{Structure of the data model package}
\label{fig:dmp}
\end{figure}

% Schema des packages Core Data Model, Utilities et Application space.
\figurename~\ref{fig:dmp} shows the package diagram of the data model.
It is composed of the packages where nodes, relationships and datatypes 
are defined, respectively. Datatypes model enumerations related to
file types values, programming languages, types of repository etc.
These will be moved into configuration files in the first stable 
release of MSR4J, so that they can be easily extended without having 
to refactor this part.

The utilities, also part of the lower layer of the superstructure,
contain necessary tools, as depicted in \figurename\ref{fig:utp}, 
for handling:
\begin{itemize}
	\item basic configurations such as logger, properties manager, concurrency settings, datatypes;
	\item database management services (create, clean, shutdown, remove, transaction); 
	an implementation for Neo4J, including indexing, is provided;
	\item simple parsers to collect data from different text sources; currently the abstract
	definition of an HTML parser is provided;
	\item repositories; currently an SVN working copy fetcher is provided. It relies
	on SVNKit library to checkout and update local copies of SVN repositories.
\end{itemize} 

The application space is where the MSR4J's user application resides.


\subsection{Limitations of the architecture}
\label{sec:limitarchi}
Inheritance is not yet a stable and powerful enough feature in Frames to allow 
hierarchy to fully take place. That is why in this version of MSR4J we directly
included properties related to the ASF case study in the Committer interface 
(membership and emeritus membership). Ideally, there should be an domain specific
ASFCommitter interface (implementing a ASFCommitter node) which extends the Committer interface.
The new interface would handle the mentioned properties.



\begin{figure}[!t]
\centering
\includegraphics[scale=0.7]{utilsPackage}
\caption{Structure of the utilities package}
\label{fig:utp}
\end{figure}
